Including prosodic cues in ASR systems

نویسندگان

Diego H. MILONE

Antonio J. RUBIO

چکیده

Several aspects related to production, as well as natural perception of speech, have gradually been incorporated to automatic speech recognition systems. Nevertheless, the set of speech prosodic characteristics has not been used for the time being in an explicit way in the recognition process itself. In this work, an analysis of the prosody’s three most important parameters: energy, fundamental frequency and duration, is presented with a method to incorporate this information into automatic speech recognition. Prosodic-accentual features are incorporated in a hidden Markov models recognizer. Their theoretical formulation and experimental setup are presented. Several experiments are developed to show the method behavior in a Spanish continuous speech database. From this understanding and with other database subsets, the overall results provide a word recognition error reduction that would reach more than 30% when prosodic-accentual cues are incorporated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Production of English Lexical Stress by Persian EFL Learners

This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...

متن کامل

Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish

We present the results of a series of machine learning experiments aimed at exploring the differences and similarities in the production of turn-taking cues in American English and Argentine Spanish. An analysis of prosodic features automatically extracted from 21 dyadic conversations (12 En, 9 Sp) revealed that, when signaling Holds, speakers of both languages tend to use roughly the same comb...

متن کامل

Generalizing prosodic prediction of speech recognition errors

Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR errors to experiments with a new system and new domain, the W99 conference registration system. O...

متن کامل

Language identification on code-switching utterances using multiple cues

Code-switching speech is an utterance containing two or more languages. Usually, the switching linguistic unit is in clause or word levels. In this paper, a two-stage framework is proposed, containing a language identifier and then a speech recognizer, to evaluate on a Mandarin-Taiwanese codeswitching utterance. In the language identifier, we use multiple cues including acoustic, prosodic and p...

متن کامل

Predicting Automatic Speech Recognition Performance Using Prosodic Cues

In spoken dialogue systems, it is important for a system to know how likely a speech recognition hypothesis is to be correct, so it can reprompt for fresh input, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have discovered prosodic features which more accurately predict when a recognition hypothesis contains a word e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Including prosodic cues in ASR systems

نویسندگان

چکیده

منابع مشابه

Production of English Lexical Stress by Persian EFL Learners

Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish

Generalizing prosodic prediction of speech recognition errors

Language identification on code-switching utterances using multiple cues

Predicting Automatic Speech Recognition Performance Using Prosodic Cues

عنوان ژورنال:

اشتراک گذاری